A Fast K-prototypes Algorithm Using Partial Distance Computation
نویسنده
چکیده
The k-means is one of the most popular and widely used clustering algorithm, however, it is limited to only numeric data. The k-prototypes algorithm is one of the famous algorithms for dealing with both numeric and categorical data. However, there have been no studies to accelerate k-prototypes algorithm. In this paper, we propose a new fast k-prototypes algorithm that gives the same answer as original k-prototypes. The proposed algorithm avoids distance computations using partial distance computation. Our k-prototypes algorithm finds minimum distance without distance computations of all attributes between an object and a cluster center, which allows it to reduce time complexity. A partial distance computation uses a fact that a value of the maximum difference between two categorical attributes is 1 during distance computations. If data objects have m categorical attributes, maximum difference of categorical attributes between an object and a cluster center is m. Our algorithm first computes distance with only numeric attributes. If a difference of the minimum distance and the second smallest with numeric attributes is higher than m, we can find minimum distance between an object and a cluster center without distance computations of categorical attributes. The experimental shows proposed k-prototypes algorithm improves computational performance than original k-prototypes algorithm in our dataset.
منابع مشابه
An Effective Montogomery Algorithm Using Multiplier Circuits
Modular exponentiation is the cornerstone computation in public key cryptography systems such as RSA cryptosystems .The operation is time consuming for large operands. This paper describes the characteristics of three architectures designed to implement modular exponentiation using the fast binary method: the first field programmable gate array (FPGA) prototype has a sequential architecture, th...
متن کاملFast Classification with Binary Prototypes
In this work, we propose a new technique for fast k-nearest neighbor (k-NN) classification in which the original database is represented via a small set of learned binary prototypes. The training phase simultaneously learns a hash function which maps the data points to binary codes, and a set of representative binary prototypes. In the prediction phase, we first hash the query into a binary cod...
متن کاملScalable Image Annotation by Summarizing Training Samples into Labeled Prototypes
By increasing the number of images, it is essential to provide fast search methods and intelligent filtering of images. To handle images in large datasets, some relevant tags are assigned to each image to for describing its content. Automatic Image Annotation (AIA) aims to automatically assign a group of keywords to an image based on visual content of the image. AIA frameworks have two main sta...
متن کاملThe Design of a Nearest-Neighbor Classi er and Its Use for Japanese Character Recognition
The nearest neighbor (NN) approach is a powerful nonparametric technique for pattern classi cation tasks. Although the brute-force NN algorithm is simple and has high accuracy, its computation cost is usually very expensive, especially for applications such as Japanese character recognition in which the number of categories is large. Many methods have been proposed to improve the efciency of NN...
متن کاملA modification of the LAESA algorithm for approximated k-NN classification
Nearest-neighbour (NN) and k-nearest-neighbours (k-NN) techniques are widely used in many pattern recognition classification tasks. The linear approximating and eliminating search algorithm (LAESA) is a fast NN algorithm which does not assume that the prototypes are defined in a vector space; it only makes use of some of the distance properties (mainly the triangle inequality) in order to avoid...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Symmetry
دوره 9 شماره
صفحات -
تاریخ انتشار 2017